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Preface 



The American Assodation of Colleges for Teacher Education (AACTE) 
is pleased to publish this paper as one of a series of monographs spon- 
sored by its Contmittee on Performance-Based Teacher Education. The ser- 
ies is designed to expand the knowledge base about issues, problems, 
and prospects regarding performance-based teacher education as identified 

in th? two papers on the state of the art developed by the Conmittee it- 
self J *2 

Whereas these two papers are declarations for which the Committee 
accepts full responsibility, publication of this monograph (and the 
others in the PBTE Series) does not imply Association or Committee en- 
dorsement of the views expressed. It is believed, however, that the 
experience and expertise of these individual authors, as reflected in 
their writings, are such that their ideas are fruitful additions to the 
continuing dialogue concerning performance-based teacher education* 

This monograph addresses one of the crnitiicaH^ iprdbl ems in designing, 
and implementing performance-based^ teacher education iprograms,. namely, 
the assessment of teacher iperrfprmance. the ^problem, however, is not 
uniique to PBTE. V^fl of teacher education faces the iprobl^^ of evanuat^ngi 
^program efftectiveness throughi the assessment o^ the ip^rfprmance Pf grad- 
uates. Ifhe design present;ed^ i s a Signi^Mcant addi?tion tp the IHt^rature 
not only a:bput RBTE but about a^fl teacher educatiioh^ 

^VACTE acknowledges wiith aRprecnatix)n the role of the Natibnafl: Cen- 
ter for Improvement jof Educationa'l' (NCiE# of the U.S. Offn^e of 
Education^ in the =RBtE Prooect. its ^^nanciaH^ support lOp^^ov^ded through 
the Texas Education AgenC)0 as welil as its :prg^essijpnail^ stimul at^jon ^ :Par- 
ticul^rly that of ^fl^n ScliBTiieder, are major contrnbutiions to the Com- 
mitiee'^s work. The Association aeknowliedges al'so the cpntributipn of 
members of the Cpninittee whp-served as re^^ thns^paper. Special: 
recognition is due itprrin Kennamer , #n?nitt;ee Chailrman; Davrfd^ R. Krath- 
wohli, member of the Cpnpittee and chavirman of its publ?ix:ati^ns ta^sk ^orce; 
and to Shirley Bbnnewiflfle and Jane jReno of the Project staff for their 
cohtributipns to^ the d^velopeht of this publSitation^ 

EDWARD e. WMEROl KARL MASSAMRI 

Executive Director, AACTE Associate Directpr, MCTE 

and ibi rector , PBTE ^Projiect 



^Stanley Elam, Beiiformrioe'-Baeed Teaaher M iVhat Is the 

State of the Art? (Washington, pvC: Tihe American Association of Col- 
leges for Teacher Education, December 1970^!. 

^/ttCTE Corpittee on Performance-Based: Teacher Education, Achieving 
the Potential of Eerfdmanae'-Bas^^ Recjonmendatione 
(Washington, D.ie..: Iihe American^ssbciation of Colli eges ^or Teacher Edu- 
cation, Rebrua?*y^ 0 974^> 
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Introductory Note 



Does the PBTE movement have its feet on the ground? If so, it has a 
couple of very important Achilles' heels: 1) the problem of measuring 
or assessing the performances or competencies and 2) once they can be 
measured or assessed, their validation as behaviors that make a differ- 
ence in student learning. The PBTE Committee, which sponsors this mono- 
graph series, has stressed these problems in the recommendations of Mono- 
graph No. 16, which summarized its first three years of work. Often in- 
correctly perceived as an advocate of PBTE, the Committee in reality is 
concerned that PBTE be properly implemented as one means of teacher edu- 
cation. Then let's see what it car. contribute. Such an appropriate 
trial Qonnoi come about unless the two problems mentioned above are much 
closer to solution than at present.. 

Do these problems mean that we should not try to implement such programs 
until then? Not at all. However, the existence of the problems suggests 
that making everyone conform to a PBTE mode is unwarranted. But, through 
ganrj^ngnprpgram. development as far as we can and. dolhg: the ibest jpossiiBl^ 
fvailyation^ we can begin t:6 ibuina research, into these programs vthat 
Willi] heap us to determine what charic^eri;stiics. make a difference. 

Therefgrey ^his monograph i^^ wiewed. by tte PubUiications Subcopiit:tee as 
ope of Its most important. The exploration of the psvchometric realities 
of assessment, which, is in the early ipart of the .mpnograph, is a. problem 
that workens nn. the fie^lid. intuistively sense:, .but t know^of no pTace where 
lit has .been, land out as it: is here. 

Here are some quotations to suggest what is in store for you. in this mono- 
graph: 

I:t seems iprpbable that there is a .negative relationship between, the 
social: importance of a specific goal and: the .length of time it takes 
the average pupil to show appreciable growth toward it. 

On the reliability of measures of class gain to assess a teacher: 

the (research) data are sparse but consistent. ... (and indicate that) 
we would have to test each teacher in at least 20 different classes 
...to obtain... minimally acceptable reliability 

On observer measures: 

.....4t is critical to establish empirically that... (the) items do be- 
long together... because it is quite likely that some items that ap- 
,pear to belong together. .. (conceptually) will not hang together em- 
pirically. ...This is what happened to us when we tried to assess 
teacher control.... 

Enough to intrigue you? There is a lot more! 

The monograph contains important conceptual points and, from three of the 
very best researchers in the field, much sage advice for successful work 
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in assessing and validating teacher behaviors. 

A topic of this kind is not easy to develop without assuming some back- 
ground on the part of the reader. The authors have assumed as little as 
possible. We have tested their monograph for readability and find that 
students with one course in tests and measurements have not had trouble 
in comprehending it. 

For all these reasons and more, we commend it to your attention. 



DAVID R. KRATHWOHL^ Member of the 
PBTE Committee and Chairman of its 
Task Force on Publications 
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Introduction 



If there is a single word that describes the role of assessment in 
a performance-based teacher education (PBTE) program, the word is crucial. 
The very name implies that the decision base in such a program is perfor- 
mance, that is, demonstrated competence. Decisions about the routing and 
progress of a student through a PBTE program and out of it into the publ 1c 
schools are, then, by definition based on assessments of teacher perfor- 
mance. 

This fact is generally accepted; in two seminal publications of the 
American Association of Colleges for Teacher Education's PBTE Committee 
(Elam, 1971, pp. 6-7; The AACTE Committee, 1974, pp. 7, 30), the cruci- 
ality of the role of teacher performance assessment is emphasized as part of 
the definition of performance-based instruction itself. 

In the development of most of the PBTE programs in current operation, 
much more attention has been paid to such problems as those of developing 
mo<Jule| and reorgam^ nnstnucfeiin #an to the-deve1^)pmeht of adequate 
assessment proteidures. The pn6ba&l;e consequences of ailiso 
cited by the Committee [op. cit., pp. 40-41); a particularly forceful state- 
ment by Krathwohli says ^ hat: 

One cam prednicl thait iperformance-based teacher educatfijon 
(PBTE) is certain to fail to reach its ultimate objective if 
it continues on its present course. This failure will be 
caused by the almost complete lack of attention given to the 
assessment of teaching competencies, a core concept of PBTE. 
(Merwin, 1973, p. v) 

_ We suspect that this neglect of the assessment problems is not en- 
tirely the result of conflicting demands on program developers' time. Our 
contacts with these harried individuals reveal that they have a strong re- 
luctance to tackle these problems because of feelings of inadequacy arising 
from the lack of knowl edge of adequate approaches to solutions. It will be 
the purpose of this monograph to make seme suggestions designed to allay 
these feel ings. Little enough is known about the assessment of teacher 
competence to indicate that even the modest proposals we feel qual if ied to 
make may be useful to program developers. 

We will begin by presenting a paradigm or cognitive map of the area 
and by attempting to make the task more manageable by breaking it down 
into smaller subtasks. Specifically, we shall deal separately with the 
three principal areas: (1 ) developing techniques for assessing teacher 
performance, (2) specifying the competencies to be assessed in measurable 
terms, and (3) validating the program by validating the competencies it 
develops in its graduates. 

Each of these tasks is too complex, too difficult, to be treated 
adequately in the space available. Nor, for that matter, is enough known 
about any one of the topics= to justHify an attempt at definitive treatment. 
Alii that we can ^ppssnbl^ cMm is that we have tried to shed some light on 
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each by drawing on our own experience as researchers in teacher behavior. 



A Simple Paradigm 

It will be convenient to distinguish four different levels in the 
teacher s professional development at which the teacher may be assessed, 
as shown in Figure 1. 



Level I 

Training 

Experiences 



Level II 

Teacher 

Performance 



Level III 



Experiences 



— 




Level IV 






Pupil 






Outcomes 



Figure 1. Assessment Levels in Teacher Education 



Pr h;,<:''-'hart^' *utfJ^^5 to assessments of the draining experiences the teach- 
er has had. What courses has he taken? What moduTes has he attempted? 
Whnch. ones has he mastered? Which= has he bypassed on= the bas.is of "hav^ing^ 
demonstrated mastery of i;ts object%es beforehand? 



W refers to assessments of the teacher behavior^hiHe he 
ns attempting to fun;fiil!l= the role of a teacher. =What Mnds of questions 
aoesjne. asK in interaction wiith pupiilis? How does he organize his #ass 
for ^nstruct^pn? How does he determine the objectives of instrucWon? 

jtevej mi refers to assessment of the behaviors of pupvVs under the 
gu,idance of the teacher be.ing= assessed— assessments of the experiences 
|h^y have which we ail!l= know must form the basiis for any learning that 
takes^piace. What kvi nds of tasks .do the pupiflis perform, in. or out of 
class? riow much time do. they spend: in active .participati^>n in: cil^ss dis- 
cussions? How- often, does each chiild receive reinforcement and: for what? 

tevel M refers to assessments of the outcomes of instructions -^of 
those changes in behavior that iit is the ^purpose of educatvion to :bring 
about. How we;ifl= does the .pupisV read? What are :hi:s attditudes toward in- 
dependent nearning in aduHfl: liife? What :ynd: of a ci^tiien: does he -become? 



Each: of these four asse 
by a rectangle and the rectan 
of inf^luence or of cause-and- 
as influenced by,, in part the 
learning experiences the pupi 
turn are seen as at least par 
(Level II); and, finally, the 
fected by the experiences he ! 
these things are, of course.. 



ssment :leve:ls is represented in the diagram 
igles are joinedi by arrows representing isines 
effect. Thus outcomes (Level IV) are seen 
result of, pupil behavior (Level III), of 
1 has while in school ; these experiences in 
tly determined by what his teacher does 
way the teacher behaves— "teaches"— is af- 
has had during training (Level I). All of 
strongly influenced by other factors not 

=2^ 
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shown in the figure, e.g., coimiunity, school, pupil, and teacher charac- 
teristics. 

The whole enterprise of teacher education is, of course, based on 
the assumption that, despite these extraneous factors, tha influences 
that are assessed at each stage are potent enough to have an appreciable 
effect not only on the level irrenediately following but on all subsequent 
levels. The concept of teacher effectiveness ^ in particular, is based 
on the notion that pupil learning outcomes (Level IV) are affected by 
teacher behavior (Level 11). And the justification for the very existence 
of teacher education is the presumption that what happens to a teacher in 
training (Level I) can somehow increase his effectivenesso that is, affect 
pupil learning outcomes (Level IV). 

Implications of the Paradigm 

The very existence of intermediate Levels II and III suggests that 
the effects of training on teacher effectiveness are subject to attenua- 
tion and are, therefore, difficult to establish^ Any impact on pupil 
learning iOLeve?l! WiH^ of teacher behavior (OUeveli H^i must ^be achieved thnough 
pupil behaviors (Level III). Two teachers who behave identically will 
achieve the same putcgmes onil^ 1# their ^pypnils a^^p behave in^ the same wa}^. 
M we attempit to relate teacher behawior to =pup1il learnlng^wllthbut paying^ 
some heed to what the ipupldis are do/ing:^. w shouHd^ be surprised^ to fAndi 
that the correHatlihs tend to be liow. 

In the same wa)^, the reflatljonship between teacher education -(Levels I^ 
and eivfiective Instruction ^OLevel 1# depends oh what teachers and :pup:iil^s do 
In Leveies M and Bft, We must traHn teachens toi behave in such-a^ fashion 
that their pupils will behave in such a fashion that the pupils will learn 
more! 

The Concept of PBTE 

We now see that the essential innovation involved in performance- 
based teacher education is a simple one: when program decisions are based 
on assessments, they should be made at some level higher than level I. In 
the past, decisions about when a teacher education student is ready to 
graduate or to be certified, promoted, to receive merit pay, or the like, 
have been based mainly on Level I assessments: on what courses the teacher 
has had, what degrees or other evidence of training he can present, or (on^ 
the assumption that experience is the best teacher of teachers) on how much 
experience he has had. The PBTE notion is that such decisions must be 
based on demonstrated competence rather than on evidence of training or ex- 
perience supposedly related to competence. The problem is: how and at 
what level should competency be assessed? One controversial Issue is 
whether competence can be assessed at Level II or whether it must be as- 
sessed at Level IV. It is to this question that we shall next address our- 
selves. Let us first discuss the feasibility of Level IV assessment. 

Assessing Teacher Competence in Terms of Pupil Outcomes (Level IV) 

5ft seems to us that much of the enthusiasm for PBTE manifested ini the 
past is based on the Impressvign that Level: W assessment wi 1 1 be used« 

--3^ 



Claims that a PBTE program would graduate only teachers who have actually 
demonstrated their competence to teach have been taken as meaning that 
only teachers of proven effeotiveness w'* -ned out in PBTE progran.i. 

This concept has a very attractive ^^und to the school administrator, 
the state certification officer, the legislator, the taxpayer. The factory 
that manufactures television sets releases only functioning sets; quality 
control insures this. Why should the college of education not do the same? 

In a pc-Tiod of t^ me when accountability has become a catchword, such 
a proposition seems part^'cularly reasonable. Why should we who train teach- 
ers not require each and every candidate for certification to demonstrate 
that he can teach something to real pupils and have them learn it? 

It is our contention that, attractive though it may sound, assess- 
ment at Level IV in teacher education is not a viable strategy. To sup- 
port this claim. It IS only necessary to ascertain the degree to which 
Level IV measures are likely to possess the three essential characteris- 
tics of a useful test or other measuring device: validity, reliability, 
and pmgttoaUty. Let us examine each of these terms as they appliy/ to 
Level; IV^as^essments as the-basds for decns^ons 4n= perfbrmahce.^basedi teach- 
er educataon and- cer-tiifticatdon. 

^„ ^J^^^:?^^^''^ begin, by discussiing; ^he ^ailiidntj^ of such measures of teach- 
er effectiveness. 

Validity of Level IV Measures 

Before we go any further, let us make clear what we mean by a- Level! 
lA/ measure which we shalfl: caHi\i admeasure of teacher efy^edtiveness. Such^ 
a. measure ns tasedi on ipupiil gains on- a. test or other .measure of the out- 
comes of instruction^ Tlypjca^iU^, a group of .pupiilfs i;s pretested, taught 
■by the teacher for awprejscribedi .period' of time, andi posttestedJ. The nieani 
gain ,. usua1% adoustexi; statiistical%.to eHfl such, inMuehces as pre- 

test^and^abifliiity, is taken, as the measure of outcomes and', in -this case, 
of the effectiJVeness of the teacher. Teacher effectiveness .must be mea^ 
sured in terms of effects -on ipupifl^. 

The ^ailiidflty of so direct a measure of a. teacher 's abifliity to get 
.pupiflss to learn seems sel/f-ewident and: it may be for the liimi^ed type of 
learning; usualiliy assessed in thiis manner, but not necessariHiy for any 
.other type. Most seriious attempts to us6 Level. IV^ assessments-or "teach- 
ing tests -.-as dev.ices for measuring, teachers tef F'lahdprc: 1Q7A. Dnn 
'h|m. n. d.,; IST^i:) have for Ob^^6us re^sons-uyPkI'a^^^ 
of instryction-^a few. hours or a. few days at the.most. T^his obvdbusly 
Irtmits them to measurements of the .Mnds of effects on pupils that can. 
bedetectediinia relatively short time. 

=We know of no systematvic research, into the length of tdme it takes 
to produce measurable gains towardi various types of objectives of in- 
struction. Such: research is badliy needed'. In any case ^ it seems mk&Vy. 
that much less time i;s needed .to teach some facts, especially if they 
need: to be retained only long: enough, to. pass a unit test,, than to help- 
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pupils become self-directed learners, to serve, or to become responsible 
citizens. It also seems probable that there is a negative relationship 
between the social importance of a specific goal and the length of time 
it takes the average pupil to show appreciable growth toward it. 

This means that "teaching tests" of this type can validly measure 
how effective a teacher is in achieving only short-term goals, which are 
almost certainly the least important goals of education. The adoption of 
this type of measure of teacher competence would systematically select 
as "best" those teachers who are good at teaching facts to pupils, facts 
the pupils may well forget as soon as they have passed the unit test; 
while the teacher who sacrifices this kind of learning for the sake of 
achieving more important "higher level" outcomes might tend to be elim- 
inated or at least scored as less effective. If such an approach could 
and did work, its effects on public education could be disastrous. 

Little is known about the structure of teacher effectiveness; it 
may be that the teacher who is most effective in teaching pupils facts 
is also most effective in teaching them to analyze and synthesize, to 
apprecnate HHterature, etc..; but this suppc siA^^h n;s at ibest doubt^ul^^ 
We shaflil^ later ci$e some evidence that dit ils mot so. For the ^present, 
lit us concilude that the ^valSiclnity^^gf "teacher tests" of. abi^^iSy to 
achieve short-term outcomes as prednctons of overaflfl^ teacher eMectdve- 
ness n s by np mean^s seHif -evident, M teveli I^ measures are to-be iused> 
they^ shouil cl: be tased^ oni measures of munit^pl^ outcomes an iperiod 
of teachiing H^ng enough ^g-detect iprogness tbwardi ^ng^erm goailis-ra 
minimum of a semester or two^rbefore we may assume them to-be \rafl5id^ 

Short-term teacher tests seen) Hiikel^ ^p i^easu spmeth^ing imbre 
ilSike coachdngror crammings sMilili than teacher efferitijvenes? as usuaflfly 
concenA/:ed . In^ any case , t hei r vafliidnity as ^predn ctors of overani^ efif ec- 
ti^eness must be empintailfl^ daiionstrated ibefore then^^ n;s jjustiifned. 

When standardized ax:hievement tests are usedi to imeasure ^^ngrterm 
!pup;i!l gaiins as a b^isns for teacher evailuatsion, addiMonaflhprbblems em^ 
Epr one, test valMnitjr ns threatened iby^ the we^fl-known tendency to "teach 
tP the test" , that n^s;, to emphasvi^e the specdf4i:s measured: by 
The content vafliiidnSy of such a test i^s abased on the assumpt^bn that the 
items on the test sampfle a Targe domadn of i;tems,\ so that the performance 
of the ipupdil^ on the test H^hH obtained 5Cpre^= n;s an unbiased estimator 
of his iperformance on aflil^ litems in the domain #is true scor#. When a 
teacher teaches to the test, this assumption iis uhtenabTe and the va1?idiity 
of the test is destroyed and with it the ^vaflsidiity of the measure of teach- 
er effectiveness based= on it. 

^recent iil?lu strati on of the power of this eWect i;s iprov^ded by an 
OEO study of performance contracting: in which what looked l£ike ev.idence 
of sucx:ess vanished when control was exerciised to eliiminate "teaching for 
the test." #age, n^9^2,^;9H^l. 

Unless such controls are used, evaluate* on of teachers based on meah^ 
gains of pup-'i i:s nijkeil^ tp identiif^ as mast competent a type of teacher 
nobody wants. 



The suggestion has been advanced that instead of using mean gains 
on achievement tests we use the number of pupils in a class who achieve 
mastery of the tested material at some specified minimum level of achieve- 
ment, but this presents problems as well. Small (1972) has documented 
from the history of accountability in England a century ago the fact that 
when teachers are evaluated on this basis, they tend to focus, their ef- 
forts on pupils at or near the specified level to the detriment of pupils 
at either higher or lower levels. Once again we run the risk of reward- 
ing the wrong kind of teaching. 

The general problem is that attempts to evaluate teachers on the 
basis of pupils' test performance tend to focus the teaching too narrowly 
on the specifics measured by the test. 

Reliability of Level IV Measures 

Before commenting on this topic, let us agree on what the term re- 
liabilii-y means. By a reliable measure we mean one that yields an obtained 
score quite close to the true score of the person measured. In the case of 
a Level IV measure, the true score of a teacher would be the adjusted mean 
gaiin score of all pupils in some popuUtion of pupils all of whom had been 
taught by the teacher being assessed for the prescribed length of time under 
the prescribed conditions, etc. Clearly, the principal score of error of 
measurement in this instance arises from differences among pupils. A teach- 
er who can teach something to one child with ease might have difficulty 
teaching the same thing to another child, particularly if the two children 
differed in ability, interest, sex, race, or socioeconomic status. At 
ipresent we tend to train teachers to teach anybody. We may limit the grade 
or subject a teacher is to teach, but we do not, as a rule, restrict a teach- 
er to pupils at a certain level of IQ, of a cor tain sex or race, level of 
interest, etc. The population of pupils on which the "true" score under 
discussion is based must, then, be regarded as quite heterogeneous in these 
respects. 

Suppose, now, that each teacher to be assessed is required to teach 
a certain unit to one class of pupils and that his effectiveness with that 
class has been ascertained. How rel iable is such a score? Its measure 
of stability could be estimated by having each of the teachers teach not 
just one but two classes regarded as randomly drawn from the total popu- 
lation of pupils and correlating the two sets of mean gain scores. 

Fortunately, some data reporting just such stability coefficients 
exist. Rosen shine (1970) has reviewed five published reports of such 
studies, and Veldman and Brophy (1974) report some new data. The median 
reliability coefficient in the studies reviewed by Rosenshine is .S2; 
the median coefficient reported by Veldman and Brophy is .27.* 



*The value of .32 is a crude median of 13 coefficients reported in- 
Rosenshine' s Table 1. The value of .27 is the crude median of 30 coeffi- 
cients in Veldman and Brophy' s Table 5; because these are reliabilities 
of the mean of 3 measures per teacher, the computed median (.52) was re- 
duced by the S pea man -Brown formula to estimate the reliability of a single 
measure. More refined methods yielded very similar estimates. 
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These data are sparse, but surprisingly consistent and imply that 
if we use the mean gains of one class of pupils taught by a teacher as 
a measure of teacher effectiveness, our "teaching test" may be expected 
to have a reliability coefficient of about .3. This means that more than 
90 per cent of the variance in such scores must be attributed to unknown 
influences— to chance. The competence of the teacher assessed accounts 
for less than one-tenth of the variance. Not a very good basis for pro- 
gram decisions! 

Most textbooks in tests and measurements recommend that the mini- 
mum reliability for a test to be used to measure individuals should be 
.90 or .95. To achieve this standard, we would have to test each teach- 
er in at least 20 different classes in order to obtain a measure of 
minimally acceptable reliability (.90) with a teaching test— a procedure 
which is out of the question. 

Quite aside from the question of their validity, Level IV measures 
of teacher effectiveness are of doubtful value, then, because of their 
exireme% How reOdabiflrfity. 

Rracj^j:anjia^y io;^;LeveT M V\ssessments 

The pracl^caflSiity^ of a-measumng-dewice has ^o-do wnth such matters 
as how much nS: costs:; how. ^jpng. takes to adrninifster and scori nit; what 
Its use requires in terms of materials, personnel, and the like; and the 
avpfl^bifl^ty^ of aliternate ^orms. 

The ideal measure of teather compjetence for use if] a^perfiormance- 
basfd teacher education iprogram shou^xl^ be usable not ion% nh the terminal^ 
or summat^ye eYafluaWb teacher, but a^so nh the vFormafeiye stafes to 

prpyrf^e a basns for rout^ng-the student through ^he iprogramr In ai ffiodu- 
1?r ^program in :part^cun:ar^/ there is need; for instruments desn^ned to mw 
sure the competence each Huoduil^ is intend# to develop^ dhstrumenti^^^ 
can-be usedi as ipretests to determine whether the teacher shdul^^^enter or 
bypass any given module and also as posttests to ascertain whether he has 
achieved the goal of a module before he passes on to another. 

It should be clear that a Level IV measure of teacher effectiveness, 
one based on pupil outcomes, is not a very practical device for such pur- 
poses for at least two reasons. One is that, by its nature, such a test 
does not focus on a single competence— the complexity of the teaching act 
means that a number of competencies are involved in teaching the simplest 
concept, even to the smallest group of pupils. If it were possible to 
simplify the teacher *s task to such a degree that one competency alone 
were used, the situation would be so artificial that the intrinsic val-id- 
iity of a Level IV measure would be lost. 

The other factor limiting the practicality of iLeveH: IV measures is 
that they are far too cumbersome to use. Even in its simplest form, this 
type of "teaching test" takes a number of hours to administer and requires 
the time not only of the student but also of se»' ral pupils, who must be 
on call and available whenever a student reaches that point in his own 
individual progress at which he needs them. Normally, the same children^ 
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may not be used more than once in the same test so that a truly gigantic 
pool of pupils must be available. There is no need to elaborate further; 
the impractical ity of such an approach should be obvious. 

If the teaching performance test we have been discussing has any 
use at all, it may be of use at the end of the individual 's training. 
After a student has completed a program and may, therefore, be presumed 
to have acquired all of the competencies needed to be certified", such a 
test might be administered to find out whether he is able to put them all 
together— to deploy what he has learned effectively in dealing with a teach- 
ing problem. But except for this specific purpose, we suggest that any- 
thing defensible as a Level IV measure is not practicable enough to be use- 
ful; even in this one application, the validity and reliability problems 
are such as to make its utility very doubtful. 

The Morality of Level IV Assessments 

Quite aside from the pragmatic questions discussed above, ws have 
some philosophical reservations about Level IV assessments. Use of such 
deviices efifecti verify makes the advancement of one human; beiihg— the teacher 
—dependent gh the behaviior of another human be.ing-- the pupiil. The ^ 
teacher's future depends on events which are not j and* shound not bef, en- 
tirely under his control . The teacher's self-interest requires him to 
manipulate pupils so that they will behave in ways that will result in a 
favorable evaluation of the teacher. The resultant pressures on the pupils 
are all the more repugnant in that the pupils may be unaware of them, and 
constitute no less of a threat to their human rights. The pupil has the 
ultimate right not to learn— not to behave in the fashion prescribed for 
him by the teacher or school . And in order to be evaluated as competent, 
the teacher is virtually forced to violate this right. 



Assessing Teacher Competence Based on Pupil Behavior (Level III) 

Even though the relations between teacher behavior and pupil be- 
havior (Level III) are likely to be higher than those betv/een teacher be- 
havior and pupil outcomes, it seems to us that pupil behavior should also 
be dismissed from consideration as a basis for evaluating teaching. 

Assessment at Level III involves the same problem of morality as 
assessment at Level IV, as well as others. It seems to us that the ulti- 
mate responsibility of the teacher is to provide pupils with the oppor- 
tunity to learn, not to "make" them learn. The competent teacher would 
be the one who could maximize the opportunity afforded each pupil under 
his care to learn what he needed to learn— the one who (1 ) could diag- 
nose pupils' needs and capabilities, (2) prescribe appropriate learning 
activities, that is, those most likely to result in learn inn for each 
pupiil., and (3) work with the pupil in such a way that he would be most 
■Isi kely to experience those activities. But these three activities are 
teacher behaviors and a part of Level II. Because Level III shares so 
many of the problems which Level IV has, it seems to us that neither of 
them is an appropriate level for assessment, 
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In suimiary, pupil outcomes are not a satisfactory basis for evalu- 
ating teaching. Although the use of relatively short time periods for 
evaluating teaching has been advocated, this procedure is of questionable 
value because the results of short teaching periods are not known to re- 
late to those of longer periods and the material taught in a brief time 
is likely to be simple and factual rather than more complex or abstract. 
We do not know that teaching facts and more complex material require the 
same skills and there is some evidence that they do not. The results of 
short-term teaching units, therefore, risk being irrelevant or even mis- 
leading. 

Nor does the use of year-long time periods solve this problem. The 
evidence determined by correlating the gains made by two classes taught 
by each of a series of teachers indicates that the data from about 20 
classes of pupils would be required to reach the minimum standards of re- 
liability usually required for making decisions about individuals. 

Finally there are the problems of pre^'enting the teacher who is to 
be evaluated from teaching the test and of the likelihood that the teach- 
er wiilfli cjonc§ntr^te her effonts on teachingiipupJils near the crnS:erion, 

the number of pupdfl^s nieet^ing. s^me minimum standard^ 4:s th^ melrsure of 
teacher competence. Enther nesulft would l)e 4:bo nanrowTy focused. 

Vflil! in aM, evaluating: teaching: by testing; ipup:iils se^ms unsatisfac- 
tory or even: damagiing^ andHt i;s hard tp^ seB how^modnf4ca;tion cbuW make 
Hit funttionaili. 



^ssjessnng. Teacher G^^mpetencje Based-on Pey^formancje i^be M^i 

Siince beve^^ tS assessment, assessment ^basecli on meafirwres of teagher 
penfgrmqnee j seems at ^present tq^^be the only vdable counse ope^ 
we shaiia ipnocee^^on^ th^ assumption^hat ilieveil M assessments wdflil^ be the 
prdncrpali basn s f or djecn si on -makdiigi within a^ ^s we sha^l^l^ 

seei, there rs reasqn to be^liiieve that such. measures can ibe ipratt^cal^ re^ 
liiable, and objective enough to-meet the assessment needs of such a pror 
gram. The^ir acioption does, however, Heave us wiith^ a cl^ar respons^biil^ty^ 
for estabisi^hing the vafliiddty of the measures. / 



Some Characteristics of Adequate Rerformance Measures 

There are three distinct stepis involved in obtaining, an accurate 
measure of teacher .fxerrformance : l^^i ^ sampi e of the relevant behavior 
must be obtained; ;t2^ a scorable record of the ibehavior must be made;; 
andnfSi^ the riecordiimust be quantif 

In order to obtain a relevant behavior sample^ we must put the teach 
er in a siituationi in-which he has an opportunity to use the ccnnpetency 
in question. Perhaps the best strategy is to put him in something very 
ISike the "teaching test" situatsion^ described: under beve^i ^ assessment, 
that iS:, give him a teaching task to per^form, or a teaching, problem to 
solve, which clearly requires the us^ of the competency to be assessed. 
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Since the assessment is to be based, not on pupil outcomes but on 
whether the candidate follows the best known practice— uses the strategy 
our best knowledge identifies as optimal— there is considerably more lati- 
tude permissible in setting up the test situation than would be permissi- 
ble If we were attempting Uvel IV assessment. Role-playing, micro-teach- 
ing, and other forms of simulation may sometimes be appropriately em- 
ployed. 

The problem of obtaining an accurate scorable record of the perfor- 
mance may be attacked with the aid of the extensive experience gained in 
research (done mostly during the last two decades) using recently developed 
techniques for coding classroom behavior on the basis of direct observa- 

J^^'* example: Medley and Mitzel, 1963; Simon and Boyer, 1967, 
1970; Boyer, Simon, and Karafin. 1973.) While it is not likely that these 
techniques can be adapted to measure al 1 of the competencies we need to 
assess, the methods used can be adapted to the construction of instruments 
that will. 



Soorability of the behavior records depends to a great extent on the 
amount of ^nferent^al judgment irequnredi of the observer in making the re- 
^grdu The typ;ican' rating;, which is used: in. too many ;BBTE prbgrams today, 
IS not scorable in the sense we mean because it does not yield a behavior 
record; at all. Ratings record judgments or evaluations for which the re- 
levant behavior has only been registered mentally. Thus, the rater ob- 
senves, remembers occurrences which seem relevant to him, combines them 
in: some unspecified way to form a composite picture, and forms an evalu- 
ation '.ased on his own conception of good teaching or whatever measure is 
being va\:ed. However, only the evaluation is recorded, net the behavior. 
'"6 Idiosyncrasies, the subjectivity, the biases, and the errors of judg- 
nient of the raters are interposed between the behaviors to be assessed 
and the evaluation which is recorded. 

The crucial and difficult task too often neglected is that of spe- 
cifying the competency in question in behavioral terms— in terms of what 
the teacher does and how often (with the necessary contingencies also 
specified) rather than in terms of how "well" he performs or how "appro- 
priate" his behavior is. We shall have more to say later about how to 
go about the specification task. Once it is completed, the development 
of the procedure for observing and recording a sample of behavior is 
greatly facilitated. (The reader is referred to the discussion of cate- 
gory and sign systems in Medley and Mitzel, op. cit.y pp. 298-305.) 

Scoring the behavior record should be a mechanical procedure, that 
IS, it should be possible to have the scoring done routinely by a clerk 
or a computer. Use should be made of mark-sensing recording forms or of 
the devices now available which make it possible for the recorder to re- 
cord directly on magnetic tape. 

We have tacitly assumed in the discussion above that the competency 
to be assessed is a skill manifest in classroom interaction because such 
competencies seem to give the most trouble. The same requirements apply 
to assessments of other competencies, of course, but tend to be easier to 
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Establishing Validity of Level II Measures 

As we have pointed out, the teacher education enterprise must ulti- 
mately be defended on the grounds that it somehow results in more pupil 
learning in the schools. We must validate teacher education by showing 
that the lines of influence in Figure 1 exist, that is, that teachers' 
training experiences (Level I) may be expected to affect pupil outcomes 
(Level IV). 

Figure 2 provides a basis for discussing strategies for establish- 
ing the existence of these lines of influence. The strategies proposed 
are represented by the dotted arrows in the figure. Let us first define 
them from the standpoint of the research worker. 

Research attempting to establish empirically the existence of re- 
lationships between teacher behavior (Level 11) and pupil outcomes (Level 
IV) may be called researah in teacher effectiveness. Research attempting 
to relate pupil behaviors in school, or learning experiences (Level III), 
to outcomes (Level IV), may be called research in classroom learning. Re- 
search attgmptiingi to senate teachers* trannnngrexper^ences (Levels # to 
their teaching behavior (Level II) may be referred to as training research. 

These three types of research may aHil: be defended ^as viable and^ user 
^u^ strategnes. ^ fourth itype whi ch has ibeeh iproposedMn^ ihe ipast wbuW^ 
attempt to relate training experiences (Level I) to pupil outcomes (Level 
IV) , and may be called research in teacher education-^-perhaps. This stra- 
tegy has been advocated as a means of program validation and perhaps for 
non-PBTE programs it is all we have. But when assessments in the program 
are based on teacher performance (Level II), validation of a performance- 
based program and research in teacher effectiveness are iiienticql pror 
cesses. 

Or, to put it differently, if a PBTE program is defined in terms of 
Level II performance competencies of its graduates, validation of those 
competencies is de facto validation of the program. Research in perfor- 
mance- based teacher education then becomes a two-step process. Training 
research^ which relates Level I (training) to Level II (performance) ^ be- 
comes the same tMng as progrcan evaluation. Program validation as de- 
fined here becomes exportable— and importable; and all the literature on 
research in teacher effectiveness— for what it is v/orth (see Morsh and 
Wilder, 1954; Rosenshine, 1971 ; Rosenshine and Furst, 1971 , 1973; Dunkin 
and Biddle, 1974) becomes relevant. And whatever efforts at program val- 
idation are carried on in the local program augment the knowledge hith- 
erto developed only as research in teacher effectiveness. A false dis- 
tinction disappears and we have in the making a true symbiotic relation- 
ship between the researcher and the program evaluator. No longer may the 
researcher look down on the evaluator nor need the evaluator take a back 
seat at AERA meetings. We can hear the teacher educator declare: "We 
have met the researcher, and he is us!" 

Assuming that the reader who is still with us agrees with our con- 
clusion that Level II assessment, assessment of teacher performance, is 
the central concern both of evaluation and research in PBTE^ we now pro- 
pose ^todiiscuss the two ipractiicali iproblems whose sol utipns are critical 
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to the success of a PBTE program: The problem of specifying the set of 
Qompetenaies the acquisition of which is the enabling goal of any given 
program and that of validating the program by shov/ing that the develop- 
ment of these competencies will make a teacher more effective in promot- 
ing pupil learning. The discussion will draw heavily on our own exper- 
iences, in the first instance in working with program personnel attempt- 
ing to specify a set of competencies and in the second instance in doing 
research in teacher effectiveness. 



Suggestions for Specifying Competencies in Assessable Terms 

Specifying competencies in behavioral, low-inference terms is an 
early task and a central one in the development of a PBTE program. Ad- 
mittedly, it is a difficult one, but it is necessary if the nature of the 
competency is to be identified with such precision that an observer or a 
teacher can know whether the competency in question has been demonstrated. 
It is also very useful in helping decide what a curriculum, a course, or 
a training module should contain. 

In our experience in hen ping, groups wi:th f he task .of speci%ing the 
competencies to -be d^ve^oped in a program, have encountered a magor 
cominunication proBl^^ has seemed qui;te widespread. Comp^tency^ de- 
fy ners seem to fa1ri^ into two groups. iOhe-graup tends to ^produce a^ set of 
competencies that is tong an^ speciif^c; the other tends to iprdduce a set 
of competencies that is brief but abstract. Members of neither group seem 
to IHke or ^y^en tp understand: what the-other .produces and^^wheii either group 
approaches the task of designing measunes of its own cdmpetehciies., it runs 
into problems. 

Members of the first group find that they need something like 657 
measures— one for each competency— and face an instrument construction 
task that is to all intents and purposes impossible to accomplish. Mem- 
bers of the second group find it almost impossible to define ways of as- 
sessing the competencies that are public or even relatively objective, 
but must fall back on broad, general rating scales. 

A Hierarchical Organization 

One way to simplify both the task of specifying competencies be- 
havioral ly and that of developing measures of them is to attempt to ar- 
range both general and specific descriptions of behavior in a common hier- 
archical structure so that the set of more closely specified behaviors is 
seen as part of each broadly defined competency. Figure 3 is a simple 
example of what we mean. 

The figure is not meant to represent criteria recommended for use; 
rather, it is an example of how one might go about organizing such cri- 
teria. It was developed at first to help members of the two groups get 
together to verify that we were talking about different levels of complex- 
ity or abstraction for what were really the same objectives. What the 
figure represents is a way of organizing competencies that seems* to be 
useful.. The first row across the top of the chart would represent the 
highest Tevel of abstractaonr-reTati.vely broad statements about what is 
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important for a teacher to be, do, or accomplish, statements on which most 
people would agree. One example is the statement shown: "The growth of 
the child is facilitated by an orderly but an emotionally supportive en- 
vironment." 

Disagreement with that statement as an objective is not likely. 
Perhaps this reflects the fact that it does 'not real ly have enough con- 
crete meaning so that you could either agree or disagree with it at this 
level of abstraction. You can get agreement on almost anything if you 
go high enough up the abstraction ladder; if you become sufficiently ab- 
stract, you are using words that mean different things to different people 
and everyone is agreeing with his own meaning, dust the same, it is use- 
ful, almost necessary, to begin with such a common base. 

In the next row of the figure, in which we begin to break the broad 
statements down, we begin to face differences within the group which must 
be recognized and dealt with. It is here that differences must be recon- 
ciled before the program gets off the ground. 

Returning to the example, we have Hsted two clear aspects of the 
nnviitiail statement on iwhdch we wi# assume agre<2ment has been reached ^ 
One ns that the order must be obtannedi by gentl:e or noncoercdv n5f 
the env/ironment i^s to remadn support:iYe:, The other is that there must be 
a warm emotd:ohan dniimate. Thesje two statements are Ifisted on the second^ 
levels of the hierarchj^,. cal^led ;8noacl^ Chanacteri^t^cs. Although more con- 
crete than the statement aibove thesn, these bshawions are stnflfl^ not dbjec- 
tively measurable. Observers will still differ, for instance, In how 
warm Is "warm;" These behav-iors are recogni^zed to be characteristics of 
desirable cl^issroom behavior tut they are stSiUl^ not specnPIM in enough 
detaviil to be objectnve?Iy assgssabiie. 

^t the thnrd ^eveHi, we begdn to identify what may be caltl^d^ summary 
measures of behav.ior that are recognvized as components or aspects of the 
trp§der characteristics. Each surnmary measure is made up in^ turn of spe- 
c^^c Items pf behav4()r and each behavior item is def^ned-with the care 
necessary for inclusion/ in a manual of instruction for classroom observers. 
Items must be specvi^fwied= in enough cetavjil so that an observer can ibe trained^ 
to recogni^^ reH^ably^ each item of :behawix>r that iha sees demonstrated dur- 
ing a period of pbservat^'on and so that the teacher himse^W knows whether 
he has exhibited dny. one of them. This is the tourth level of the hier- 
arch)^, the most specii^ic; it .provides the :basvi;s i;or obtavining^ a scorable 
record af behav4cr. 

As an example at this most specific levelu the item^ "Suggests, 
Guides" includes such teacher statements as -Row about cutting, it over 
there, Jimmy^ OK?" "I wonder i^ you wou^d shut the dpo^^ for us, Bm," 
-Sob, woufrd you mind mov.ingi so John can sit down?" These statements are 
suggestions for change in behavior that have the characteristic of being 
"softened" % a "please" or "Oft?",, or by being phrased as questions (Soar, 
Soar and Ragosta, ^S7%. 

Another item at the gentflest 1:eve1= of control;, "Feedback, Cites 
Reason", i!s coded when the teacher gives information whith impliies a change 
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in behavior without directly asking for it. For example, the teacher com- 
ment I m having trouble hearing" is not a direction, but it doesgive in- 
formation that implies and probably produces a change in pupil behavior. 
An instance was observed in a first-grade classroom during the first week 
of class. The number one and number two trouble-makers (two boys who 
could be so identified in five minutes) were together in the back row of 
a small group, jostling, nudging, and pinching. The teacher looked at 
one of them and said, "John, I think there's room for you up here," ges- 
tured toward a spot by her feet, and waited. In two or three seconds, 
John came to sit by the teacher's feet. The other pupils did not seem to 
see It as a put-down; but it did create a big gap between the two trouble- 
makers and the problem was solved. 

At an intermediate level of control (not shown in the figure) would 
be statements such as: "OK, anyone who wants to go to the bathroom, get 
in line, which is not very coercive, but clearly a direction with a rea- 
son. Other examples would be "Get out your arithmetic books and open them 
V.0 page 27."; "When you've finished, put your papers on my desk." 

At the coerciive end' of the scaHe would-be a. stat:ement HSike, "Jn», 
stop that I which woul d; be coded; "Orders , Commands , " and: probably "Sharp 
Tone" as welfl;. ^ 

Using, a-hierarchy-nsike the one described above Hetus tail k about 
pant^cuilar aspects of a rather >vague concept HHke "orderly ibut emotionan^^y 
suppgr-tive env/ironment" in terms of a. series of fa.ir.ly objective statements 
about teacher behav/ipr. Because each speci;f.ic behav/ibr is def-iried with. 
?gff.ic!ieht care so that agreement ibetween' observers can be made acceptably 
■high, we can reliiabTy assess this aspect of behav.ior, and a- -teacher c^h 
.know whether :his behaY;ior meets the requirements oif the conipetency. 

It is not our .purpose to innt^ate a semantic argument over whether 
competenci es are narrow or broad , and we v/ould propose that varyi ng de- 
grees of generality are useful for different purposes, but we suggest that 
the term competency be used at the level of the teacher's ability to inte- 
grate the specific behaviors necessary to produce one of the Broad Char- 
acteristics of the classroom. The Sunmary Measures and Specific Items of 
Behav^ror would define the competency, give it behavioral meaning, and thus 
make it assessable. 

Mov;ing Up and Down the Hierarchy. No less important is the aid to 
communication provided by the ability to move up and down this abstraction 
ladder at will. The person who, when asked to name a competency, gives a 
theoretical statement or a broad, high-inference label for a classroom be- 
hay.1 or can be encouraged to mov,e lower in the hierarchy by being ask^d 
questions like, "What would this behavior look like if you saw it happen?" 
What would a teacher who is doing this do that one who is not doing it 
would not do?— what kind of behaviors would differentiate them?" Ques- 
tions such as these encourage people to be more explicit about what they 
mean and at the same time generate the statements lower in the hierarchy 
needed to make objective measurement possible. 



- -4. person who, when asked to define a competency, makes a state- 
ment like. The teacher should avoid direct commands and orders," can be 



encouraged to move higher in the hierarchy by questions like, "Why would 
you care about that?" "Why is that important?" "How does it make a teach- 
er more effective?" What typically happens is that the person moves up 
the scale to give a broader conception of how these behaviors relate to 
reality and to his scheme of values which puts the competency in larger 
perspective. And at the same time he suggests how the items should be 
combined into composites or clusters that are internally consistent. 

Thus there are two advantages to this sort of hierarchical arrange- 
ment: (1) communication is clarified and agreement on the competencies 
to be adopted as program goals is facilitated, because the generally valued 
broad competency has operational meaning given to it by the items which 
It includes. And (2) at the same time, the items for an objective mea- 
suring instrument of each broad competency are specified so that the com- 
petency can be measured. 

• U4. [^9"''^ 3 illustrates a part of one of many possible hierarchies that 
might be developed for assessing teacher performance. Others might relate 
to the nature and frequency of cognitive questioning, the manner of struc- 
turing^ H earning actiiv/iMes, the use of experimehtail techhnques— any of the 
■bnoad! goals for which modules and: programs might :be developed. But the 
important point Is that the behav/iors relevant to whatever troadi goal or 
concept i;s speciifded; have been= iaehtiiftied andi defined andi t:he performance 
has been made assessable. 

The Empjricali Test. There ns a criiticaili nssue which- arises whenever 
we combine items on a prciori grounds into clusters intended' to- represent 
■br^oad competencies. How can !we be sure that the speCiiiWc items of behav^ 
ior belong together in the real world as well as in the world of theory' 
Usuaflfl^y;, the items will have been selected as developed to measure behav- 
igrs which are believed to represent a single aspect of good teaching. 
Thn| procedure is similar to the way a series of items on an achievement 
test IS selected to sample knowledge in a homogeneous subject-matter area. 
But lit IS critical to establish empirically that the items do belong to- 
pper,. that the cluster is internally consistent, because it is quite 
likely that some items that are believed to go together in the conceptual 
scheme will not hang together empirically. We do not know that much about 
the dynamics of teaching yet. 

This is what happened when we tried to assess gentle teacher control 
OSoar, 1973): In addition to the verbal items described earl ier, there 
was also a smaller set of nonverbal items grouped with the verbal items 
under the assumption that they both fell on a single dimension extending 
from gentle to harsh teacher control . However, factor analysis indicated 
that this assumption was not so. The verbal gentle control items did hang 
together, but most of the nonverbal gentle control items fell into a dif- 
ferent subset which was fairly consistent internally, but not closely re- 
lated to the verbal scale. We had oversimplified the area. 

Interestingly enough, there was some crossover. Teacher smiling, 
which we classed as a nonverbal behavior, bel onged in the "verbal" com- 
posite instead of the "nonverbal" one. Probably the reason is that people 
smiile as they talk, using their faces as an additional source of stimulus 
or feedback. Other nonverbal behaviors, such as "Touches" or "Gestures" 
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tended to occur more independently of verbal behavior. This made sense 
after the fact, but these results were not anticipated. It is not at all 
unusual for the facts to contradict our best theories In this way. 

After we had discovered this clustering in the analysis of the data, 
we remembered a teacher and an aide in a first-grade classroom. The teach- 
er often smiled at pupils, praised them, and was very warm and suppor- 
tive, but she never touched a child. Her aide, on the other hand, seldom 
smiled or praised a child, but she rarely passed one without ruffling his 
hair or giving him a pat. She almost never sat dcm without a child on 
her lap and very often had one on each knee. We thought at the time that 
It was a nice example of differentiated staffing! But we see it now as 
an example of the behaviors fitting together the way the empirical analy- 
sis Sdys they do. " r j 

The important point to learn from these examples is that if behav- 
iors are put together that are not at least moderately intercorrelated, 
whatever meaning they are supposed to have is destroyed, and with it the 
discriminating power of the measuring instrument. On our original scale, 
mis teacher and her aid d.i.fferent though they were, would probably ihaye 
had: siimiilar scores =on= our compos/ite— scores somewhere dn the middle^ That 
thns kvind of emp^rdcal? check ^s crd.tical whenever iitems are combined^ should 
be obvious from thiis example.* 

Importing Past Research 

Tn: specn;fy.i ng- compelencies and: developing measures ,f or them it seems 
important to .pursue leads from ipast research ofi: teacher effectiveness, mot 
onliy for the pbv.ious reason that the research-may suggest dimensions of 
behavior which might otherwise be neglectedy but also because it may sug- 
gest specmf^ic items of behav^ior that reftlect those dimensions. This is 
realfl.y an: extension^ of the concept of "importation" of teacher ef-fective- 
ness research! for program vaMdition dnscussed earlSier^-^Jts extension- to 
the mi.tian selection of competencies and; program development. 'It would 
ibe wasteful for .programs not to make this kvind-of use of present knowledge 
developed in teacher ef>fect.lveness research, as wel.l as using such knowl- 
edge as nt becomes avadilablfi from program vailsidation studies. 

A' number of irev^ews of thds l^iterature have been cited -earl^ier: 
Morsft and-Wiilder, wm; 'Rosenshine^ TO; Rosens hi neandiEurst, TO , T573: 
and Dunknn and: Bi^dile , li974 Joyce (fl:974^: and Kay !(fl;9750 have a^l^o 
■dnscussed the various possible conceptual!, bases i^rom ^which competency lii^ts 
may be derived . f j . 



There are two ways of conducting such an empirically check. One is 
to do an .internaH: consistency item analysis of each composite and' "purify" 
It by el.iminatong nonconsistent items. The other approach; is to factor 
anailyze the entire set of measures and; see whether the factor structure 
that resulits corresponds to the a pniori structure. Both methods have 
advantages and diisadvantages. We would of^fer one suggestion: if you use 
factor anal^ysns , view the results with caution . The sampl e sizes avail - 

hv!vio,^^''!nH^n''n,h^ study a»^ u^a^l-liy smaf^r than the minimum recommended 
by experts and numbers of influences can affect the resul'ts. 
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Several concepts from our research seem to have implications for 
competency specification. One such concept is the usefulness of distin- 
guishing three areas within which the teacher exercises control that are 
often confused (Soar and Soar, 1973): (1) control of the behavior of 
pupils, (2) control of choice of subject matter, and (3) control of the 
thinking processes which the pupils use. Our data indicate that when 
these three types of control are distinguished, their effects on pupil 
growth are found to differ. Some data (collected in grades three to six) 
support the idea that the teacher's control of behavior may have quite 
different consequences for pupils than the teacher's control of thought 
processes. A factor called Indirectness vs. Silence and Confusion ap- 
peared to represent a "freeing" yet orderly style of teacher-pupil inter- 
action, and showed a significant positive relationship with pupil gain in 
creativity. Another factor, called Freedom of Physical Movement, however, 
showed a significant negative relationship with creativity gain (Soar, 
1966, p. 178). 

Subjectively, these differences in teacher control can be recognized 
in occasional classrooms. In a second-grade classroom, there was close 
control of both subject matter and pupil behavior (.the teacher had taken 
over an unruly cl^ass nn the middile of the year^. JHhere was HiiAWe ta^ Whg 
between chiildi^en and pup irti movement was =bri^f and^ itask-re1:atedv w;i:th teach*^ 
er and pupil talk so quiet it was hard to hear half-way across the room, 
even during recitation. Yet the teacher did not restrict thought processes; 
rather, she supported complex thinking by pupils. In reading groups, her 
questions were divergent, encouraging pupils to infer meanings and motives; 
in arithmetic, she sought alternative ways of solving problems; among pupil 
reports of a field trip, she valued a poetic description as highly as a 
reportorial one. The teacher closely controlled pupil behavior and allowed 
no choice of subject matter, but worked hard at freeing pupil thinking. 

In a kindergarten classroom a different combination of controls exis- 
ted. The teacher required pupils to choose an activity from the many ma- 
terials set out in interest centers and refused even to suggest alterna- 
tives when asked. When she was asked a question which was subject matter 
related, she was likely to answer with a question. Yet it was clear that 
there were well-established rules of behavior such as "No running", "No 
loud talking", "Hands to yourself", and "Don't interfere with other people's 
work." In this classroom, choice of subject-matter and thought processes 
were free, but behavior was controlled. 

^Ifthough these distinctions seem reasonable,, once proposed, the data 
indicate that most teachers do not make the distinctions as they manage 
behavior and teach. 

A different concept is an empirically derived distinction between 
"structure" and "control", two terms which may seem at first to be closely 
related. In the sense intended here, structure represents the set of stan- 
dard operating procedures which the teacher and pupils understand in com- 
mon, such as the sequence of activities which is followed daily, and the 
limits of behavior v/hich pupils understand and accept. In contrast, con- 
trol is made up of the moment-to-moment, face-to-face interactions between 
teacher and pupil s intended to modify the behavior of pupils. The data 
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'2mHt ft^ T^'^^'l ^"^^ structure in their class- 

;rr^ ?^lci:%X^^^^^^ '^''''^'^ 

Dart of^a'nrnn^i;;'r°"h'"? ^f^^"' ^l^^''!^ ^" ^ classrooHi that was 

Ehirh Ih! EI^^^"" "^^^^ Classrooms were intended to be "open" and in 
which the teacher seemed to feel obligated to give pupils greater freedom 
than she was comfortable with; as a result, the act vUy a^nolL leve? 

1 re 0 'irwhicS°Se"!L^J° '''''' ' '^'"'^ nS?se Jeac ed 

shrsteoLf in f5l,f5 r f ^^-.-PPn!:^ be^'' it longer and 

grindina hali In^iL^I^M^'i*:!"^^^* ^^'^'^l^)' things to a 

wU dJew anJ *a?;er Tbl^^^Jh^^ T'** 3°°" ^"^^ done this, she 

seemrfi^kP^v fhl; ^^t' ^^e cycle repeated itself again and again. It 
seems likely that the problem arose because the teacher and DPrhan«; thp 
program she represented, did not recognize the need for nio?e st^ucEure and 
less control -more structure would decrease tLnled for sSch Sent 
t e'5 sj; ct'Ion"r.°l'j; I'.' II'''''' ''^'''^^ ^"^ tea'che^'hl r g'nized 
strucJure^f m L r^hPhL-^''\'^''^ ^"^ ^'^^ ^'^'^^ t^^* co"l' have 
?houghrp?ocesse^^^ ''''°' while freeing choice of subject matter and 

^he^data= indicated^ that thi^ ail«:ennation of close control and= thP 
absence of control was Hakely to be destructive ^n^erms p DSDi^ nuf 

r„h..„f ?i"f?P*„*''™'' P^^t fiesearch which might be considered is the 

wdix xime . ^,1.;, the length of t^ime the teacher waits for a nun vi tn 
?nswer a question ^before she intervenes and^S the feno^h of t^P Jhp 
SpmJ'tn^H?' ^^f a pupifl! response or statement efore'lhe-Jeall lhere 
Smfperior e^f t^S'a^ J^'l'S''4 

Duni'l a r H w i t f D - • V ?s?QC rated wi.th a number of des.i rabl e .c hanges i n^ 
S^f^in^f f $^ PyP^^s-interact more, t:he , number ^of inferences and' con- 

• *u '''h^^e.^re only a few examples of dHmens^ions of behavior suaae«;tPH^ 
JnnwV }rf'V- PE'^^'""^ W^erature which are pro Sbl^ not wide y 
known but whnch should be considered in. spec,i%/ing /comScier T^v 
n^flustrate |he vaflue of import:dng knowledge ^roi^teacS effectivenesf 
research into program development. i.ticii.ner ei^tecxiveness 

Rrogram Vailtidation. 
The Need for Program ValHxlatdon 

nf concerned by the fact that we are spending so many mifl^ions 

of dollars and so much effor-t in the development and impleLLlSn of ' 



programs, with so little knowledge of what kinds of teacher behaviors really 
are associated with increased growth of pupils. The possibility that all 
this effort may be wasted is a real one. A critic of educational research 
observed some years ago that virtually all the research on teacher effec- 
tiveness could be summarized by one general conclusion— ^zc^t?z^n^ makes any 
difference. We are not as pessimistic as that, but must agree that we do 
not know very much and that some portion of what we "know" may not be true* 

There is no alternative to moving ahead with building today*s pro- 
grams on the basis of what we know today. We cannot wait for the research- 
ers to answer all the questions because they are not going to answer them 
any time soon. Those of us who call ourselves "the researchers" are prob- 
ably surer of this than anyone else. But in a situation in which we so 
often find that the things we thought we knew are not true, there is a real 
and present danger in building a complex program without beginning as soon 
as possible to find out v/hether the behaviors we "know" will produce de- 
sired changes in pupils will, in fact, do that. As hard as it has been in 
the history of educational research to show that anything teachers do makes 
a difference for pupils, it seems critical to Degin early in this whole se- 
quence to ftihd out whether iwhat we are go^ing^ to Mcohsniderable tnme, trouble, 
and expense to trann teachers to do makes a d,i:fierence to the ipupi^s these 
teachers uHitimately^ teach. The ipublMc and the l^giis^ators wills): ;want tq 
^know and they have a irnght to ^know. They^ are gp^ng to want ev^idence that 
what our teachers do Heads to-more growth by ipupnil^s and: that we, in our 
^programs, can .produce the k^nds of teacher beha\^ors whith in turn .produce 
more growth in ipupifliS. 

^Et is critdca^i to ikeep the two steps of iprogram^ evafluatiion and: vali- 
dation separate for the reasons enumerated earln^r in this paper; but both 
steps must be accomplsished, and now. 

^ would te reasonable tO: ask, at thi^ iponnt, "Why Vs it un(iesirarble 
to use pupifll measures tp evaluate the teacher:, ^but netessary to use them 
to valMdate teacher behawiors?" The two-activJties have a number of basnt 
diifferences. In program^ ^a^Ei da ti oh , pupiil^ gaflh-measures are not heecied^ 
for every^ teacher in every^:program. Such '.leasures are needed^ only for 
samples iarge enough to enablie us to leirn^ and: veri% reTattionships between 
teacher ibehavior and pupitji gain. The considera^tle amounts of error in the 
measurement process dp mot spuriously injure or beneMt individual teachers;, 
they on% weaken the rel atiprrships ivound= between measures. Greater time 
and resources may :be given to-colHectingf measures and ana%zing relations 
between them in program val^idafcion, which needs to be done only once, than 
in evanua:Hng: teachers which must ifee done over and over again. These dif- 
ferences make program vaiiatlation by pupiflh outcomes feasible, even though 
teacher evaluation by^pupi^l outccimes is^ , 

Some &xamples of Weakness lof Current Theory 



Although program developmel^t must p^^^^^^ it seems important to 

recognize the weakness of the theory on which it is based. 

Rerhaps the two innovatipns most oftten currently advocated are be- 
havior modification ^or contingency management , or behavior ana^sis,. der 
pending on the labei used^, on the one hand:, and the movement for open 



classrooms on the other. Both claim support from theory and research 
But If each of these is taken as a complete educational program in itself, 
Und that seems to be what is advocated) how could both be right when the 
programs are as different in character as they are? Some disquieting 
questions arise about how much help current knowledge is in formulating 
effective programs and reinforce the need for program validation studies 
wnose outcomes can be fed back into program revision. 

There is some research which relates to this question. We now have 
Tour sets of data which raise questions about the usefulness to pupils of 
extreme amounts of teacher control--either high (as in behavior modifica- 
tion) or low (as in open classrooms). Figures 4. 5, and 6 present data 
from different pupil groups and different observational measures. The be- 

to^rSL'"^^^"^''^^ ^^''^^ ''^^^ """"0" the fact that they represent 

teacher control in some way, primarily control of thought processes. Each 
curve represents a nonlinear relation which may be described as an inverted 
u , since It indicates that, as teacher control increases from the mini- 
mum, pupil growth increases for the outcome measure used, but only up to 
a point. Beyond this point, increasing teacher control is associated with 
deoreaszng pupil growth. That is, there appears to be an optimal amount 
ot teacher control for a gn^ven^ growth measure:, which Is neither the most 
mor the least^coritrol: in. most cases; Parenthet^talily, mot 6n%. achieve- 
ment gann, :but also sel;f-concept and^ several pensbna^liity measures show this 
tencjency. 

Relationships such as these should be considered in specifying eom- 
ipetencies. r j ^ 

Another iflilu strati on of the weakness of current theory is the em- 
phasis gnven to increasing the cognitive level of teacher questions. Yet 
laba^ kev^ne and EflJzey ;{fl;9649; concluded! that unless the teacher fnrst 
spent sufifncnent time with pupifls at the Icwer cognitive Hovels, the pupils 
^ere unable to sustain higher Hevel thinyng. ^Dunkdn and Biddli #97^, p^ 
24|^;^rev-i|w a study by Rogers and Davns showing; "a= signiJHcaht -negative 
- 2S ^^^^''"^^"^^ questioning and pupiiT iperformance on test 

ntems of the^analysnis type." Three sets =oT our data aH so indicate that 
too- much -of the interaction in the classroom can be at too high a coqhi- 
tflve level .(-Soar and Soar, i;9X2, 197^ Several =dimensnons of classroom 
interaction were scored! which represented the frequency w^^^^^^ 
tnvGly .^bstract interaction took.pl:ace between teacher and pupils, fol- 
lowing Biloom s Taxonomy of the eognitnve !Doma.in. and a Deweyian- approach; 
to teaghnng Th^se measures tended to !be megatiA^elfy associated wth: gann^ 
1 n both .pupn 1' |ch.vevem and self-concept. In some cases, the negative 
relationships held for the total pupil group, but ^ in other cases, it 
appeared to be true for disadvantaged pupils but hot for advantaged ^pupils. 

Thi s is not to conclude that a teacher should interact only at the 
lower levels, but it does imply that iit is passnble for a teacher to in- 
teract too often at too high a level. The need for a "match" between 
Where the^pupil is and where the teacher is seems obv.ious, but how often 
Jo theoretical ans ravise caution about teachers working at too hinh cogni- 
tive levels? 
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Figure 4, 

Teacher Indirectness Related to ^Rupi51 Growth 

Ebr S5 GMssxooms^, Grades 3-6 
ter Soar., 




Figure 5, 

Relation Between A Teacher Practices Observation Record Goht^ol Factor 
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Findings such as these raise questions about the soundness of the 
theory on which recommendations for teacher behavior (and probably spe- 
cification of competencies) are based and emphasize the usefulness of 
importing findings from teacher effectiveness research. There seem to 
be two aspects to the problem of defining a competency: what is the na- 
ture of the behavior and how much of it is desirable? 

Another example of the inadequacy of current knowledge is our oc- 
casional ignorance even about which variables we should study. We stum- 
bled across one such variable by accident. We had set up an analysis 
for looking at school year gain and wanted next to change the focus of 
the analysis to look at pupil gain the following summer. This change 
would have been easy except that v/e found we were one variable short on 
our data cards, so we put in what we thought was an extraneous variable 
as a 'placeholder"— one which represented the number of pupils on which 
the mean for each classroom was based. You can guess what happened— it 
turned out to be a moderately powerful predictor of gain in that analy- 
sis. After the fact, this made sense. The variable represented the 
nymber of pupils who were present through three days of testing on three 
different occas^ions. The examiners had gone back for three make-up pe^ 
riods each time, but there were still a number of pupils we lost. 

Our guess was that if the pupil came from a home in which school 
was valued, he did not have any choice; he was there. But if there was 
not much concern at home, he was able to drop out as the testing went on 
and consequently he dropped out of our data set. So probably this var- 
iable was an unobtrusive measure of attitude toward education in the home. 
This interpretation is somewhat uncertain because attendance is often as- 
sumed to represent the pupil' s attitude toward teacher and classroom. 
But analyses of the data suggested that pupil attendance as well as "sur- 
vival rate" related to gain over the summer but not during the school year; 

it seems likely that family influences are stronger during the summer 
than the school year, whereas, if attendance had reflected pupil attitude 
■toward the classroom, it would have been likely to show an effect during 
the school year. 

In summary, these are only a few examples of the weakness of current 
theory as a ba sis for specifyi ng competencies and developing programs. 
There is no question of the need to proceed with program development now, 
but these examples emphasize the need to use the empirical knowledge which 
does exist in teacher effectiveness research and to feed back new knowledge 
into program modification as rapidly as possible. 

Some Issues in Validation 

Past research in teacher effectiveness suggests a number of issues 
which may be important in program validation. There appear to be two 
classes of issues— one dealing with the limits for a given validity re- 
lationship, or the terms under which it is true; the other with the statis- 
tical analyses. 

Validity Specifications. The question is not , "Is the teacher be- 
havior vanarTliFquestTotri s Instead, "For what is it valid?" For what 
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kind of outcome, e.g., a complex one or a simple one? Unless the teacher 
behavior patterns which facil itate growth toward both kinds of objectives 
are the same, tnen the teacher must know how to produce both kinds of be- 
havior and when. There is reason to suspect that one pattern involves 
open, accepting, clarifying, reflective behaviors while the other is based 
on more tightly structured, reinforcing behaviors. The behavior which is 
* ''.u^'^^lu*^^ extreme of this range of outcomes may not be 

valid for the other extreme. Figures 4 and 5 illustrate this possibility. 

We not only need to know what kind of objectives a teacher behavior 
IS valid for, but we also need to know the kinds of pupils for whom it is 
valid. Current data indicate that this may be a critical question. The 
social status of the pupil, for instance, sometimes makes a difference 
in the kind of teacher behavior which is associated with most growth for 
mm* 

Figure 7 illustrates the relation found between pupil gain in read- 
ing and teacher control by means of coercion and negative affect, plotted 




Figure 7 

Teacher Strong Contnol in Reiatdon to I'upil Gain 
in Reading by Socio- Economic Status 
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separately for groups of pupils of low and high socioeconomic status. For 
high status pupils, the measures were essentially unrelated; but for low 
status pupils, increases in negative teacher control were associated with 
decreases in pupil growth in reading (from data reported in Soar and Soar, 
1973). While not strong, the difference 1n relationship for the tv/o groups 
was statistically significant. 

The fact that the greater decrease in gain occurred in the low social 
status group was opposite to expectations. We had expected that lower 
class pupils would have met negative affect more often, would have adapted 
to it, and that it would have had less impact on them; but these conditions 
were apparently not true. Instead, what may have happened was that the 
middle or upper class child was more likely to have support from home which 
compensated to a degree for an unfortunate classroom and which made him 
less dependent on it, but that the growth of the lower class pupil was more 
dependent on the nature of the classroom. 

Parallel findings have been reporte-*- by Brophy and Evertson (1974). 

In addition to the questiipns of vaUd for what and vcvUd for wJioruy 
we need to know, valid for how long? As we have indicated earlier in re- 
l^tnon to muii;tip1e learnving objecMves,. iJt may be that effective teaching 
for immediate 1:earning may be d^ifferent ^rom effective teaching for long- 
tenn objectives. At any rate, iit seems rvsky to assume that conclusions 
can be generaflM^zed from one to another w/iithout empirix:a^l ewirdence that 
the genera^M^aution is sound-. 

The problems cnited s^ far represent a formidablle chaHilienge for pro- 
gram vailndation, but there iis stiflfl another consddera^tvion which needs to 
be taken into account--the characteristics of the student teacher. We 
need to know what behav^iors are a *'best fit" for what kind of teacher, 
because probably not a^lfl kinds of behavior can be used effecvtiv^ely by dM 
candSidates in teacher education. 

This degree of compH^xvi^ty in the nature of effectme teacher behavior 
greatly increases the dH#i:cul4ty of vaflsi^atix)n; but i^ we faifl to take iit 
into account we risk producing in program va^ltidation the same inconsistent 
and of^ten nonsigniificant resuH^ts which have been common in teacher effec- 
tiveness research. 

Tihe Need for Complex Ana^;ysis . Perhaps one of the reasons that re- 
search on teacher effectiveness has not been more productive in the past 
is that i^t has used an inappropriate modeil. The true nature of the re- 
lationship between teacher behavior and pupiil growth appears to be very 
different from that implTCiSt in most of the research which has been car- 
ried out. Most past research has sought a sma^lfl number of large effects 
but i;t seems to us that an appropriate modeO wouM look for many sma^fl 
effects which are probably cumu^lative. If our hypothesis about large 
numbers of smajll effects is accepted, then the nature of the research 
changes considerabHy. We need to know many more things about the behavior 
of a teacher and the characteristics of pupi^Is in order to identify the 
specific teacher behav^iors v/hich are effective for particular pupi=ls. 



Another defect in past research in teacher effectiveness has been 
the use of analyses which examined only Linear relationships and assumed 
that more is always better. But a fairly common recent finding is that 
relations between classroom behavior and pupil growth are often nonlinear 
as Illustrated in Figures 4, 5, and 6 and in similar findings reported by 

?J5rV^°I°^°^ 5"'^^'' ^"^ Rosenberg, 1963; Coats as cited by Flanders. 
1970; Brophy and Evertson, 1974). 

This finding of nonlinearity seems intuitively reasonable. What 
does not seem reasonable is that we should have expected to find, with- 
out limit, many kinds of teacher behavior that increase pupil growth as 
they are increased. But whenever we calculate a linear correlation we 
implicitly assume that this is so. 

Another aspect of the relationship between teacher behavior and 
pupil growth that is often ignored is the possibility that variables may 
interact in the statistical sense. The question in its simplest form is. 
What IS the simultaneous effect of two aspects of behavior'' Is it dif- 
ferent from the effect of each considered alone?" Because classroom be- 
haviors do not occur in isolation, but rather occur in a context of other 
behaviors, vt seems intui;tviAte% sound: to. assume that the effect of one 
may be moderated by the presence or absence of another . 

An .example of such a stat^^tncafli interaction occurred; qn pur research 
Avhen_ a _variabl:e,. the iprgpor^tioh of cHassrbom activ/iSries in which the .prbb- 
Ifim had been chosen by the teacher, ,„as found 'by n;ts&|:f to be unrel'ated' to 
pup^il gann^ Another variable,, the overafli)! amount of recitation , alfso showed^ 
no Hnnear rel ation with ^pupifl gaiin^ ;But when recnaatibh and; teacher ^choice 
Of iproblem were .examined! simulitaneousil^, i,t was found- that greater achieve^ 
ment ga.in took iplace in those cil a ssrooms where one or the other of these 
variables occurred with: consi^ierafbl^e frequency, but mot both (-See Figui^e #. 
It dndi not seem to .matter which one, ibut frequent occurrence of either one 
or the other was important. On. the other hand,, if both occurred^ with con- 
siideraMe frequency, or if neither did;, pupiilis gained' Oess. Simi^j-ar effects 
were found for other pairs of variables. 

. ""sy be possible that this interaction can be related to the find- 

ings_ of nonlinearity for teacher control . If an intermediate amount of 
teacher control is associated with maximum pupil gain, then this interme- 
diate amount of control can be produced in various ways: either by an in- 
termediate amount of one dimension of classroom controlling behavior or 
by a combination of one kind of behavior which represents control and an- 
other jkflnd of behavior which provides some freedom. Two controlling kinds 
=of behavior at high levels result in too much control ; neither being pres- 

!nJ3^"m-M^!] too little control for pupil growth, just as in the in- 
verted "U's" described in Figures 4, 5, and 6. 

So far, the concept of statistical interaction has been discussed 
in relationship to the simultaneous effects of different combinations of 
classroom behaviors on pupil learning. Another example of the same con- 
cept would examine the interaction of entering pupil characteristics and 
cilassroom behavior because both are related to outcomes, as illustrated 
iM^^^/r' ^^15 IS the logic of the aptitude-treatment interaction 
l^Alilw) studies. The extent to which such interactions can be made use of 
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Figure 8 

The Relation o£ Two Classroom Behavior Measures 
with PupdE Gain in Arithmetic Concejrts 



nn teacher education or in program de\Ae1opment may^ b^ liimiftedi; the task 
of teaching, is complex enough wiithput them. But iit does seem important 
to-take 1:hem into account in the process of iprogram ^aliiciat^oh where they 
are more easi% dea^li wi^th. 

Tihere are a number of pupi^' characteristics whiih seeim to deter- 
mine in part which teaching /behavior is most lunctionail^ in a gnven in- 
stance, rgr e;^ample:, one of our Endings indicated- that ipupiilis who-were 
high in motiva^on grew-mpre in classrooms in which they were ^requentl^ 
alflowed^ to carry out assigned work at their searts and theni were free tg 
^select other activities when the assignment was finished whifle for pupiH^ 
who were How in motivation growth was greater in classrooms where this 
activity occurred infrequentn^ 

Socioeconomic status has already been cited as another pupifl char- 
acteristic which must be considered^ in identifying which teacher behav- 
iors wiflH be most functignails, as shown in Figure Z. 

These iMndings of nohliinear relationships ancl^ interactions, wh:ich 
seem to make theov^eticafl sense, indicate the neecl for complex statistical): 
analyses in program vaOEiciati^n because of their ipotentHaH for deaHMng^ 
with; the reality of the . phenomena more adequately than the simple In near 
connenations or t tests used- in the past. As evidence of this., in two sets 



of our data the number of significant linear correlations was about one 
and a half times the number expected by chance, but the number of signi- 
ficant nonlinear relations and interactions was between three and four 
times the expected number. 

Past teacher effectiveness research, then, seems to have been ask- 
ing questions that were too simple to represent the reality of the situ- 
ation. At an intuitive level it is clear that the classroom is a highly 
complex place and to try to understand it by asking simple questions is 
probably not a productive way to go. 

A Suggestion for Analysis . It is possible that some of our readers 
may be deterred from using complex methods for analyzing the relations 
between teacher behavior and pupil outcomes, including nonlinear relations 
and interactions, because of the apparent difficulty of the data analysis, 
even though they may find the rationale for doing so quite compelling. 
There are recent developments in this area that simplify the task consid- 
erably by making it possible to use multiple regression procedures to 
answer questions which used^ to be answered: by an ail^j^sfls of variance. This 
approach, off ens a degree of cbnvennence and; fHex^ibiflii^ which is an. dm- 
pbrtant practdcail' advantage!,, as .wef:iil a^ offering, some theoretdeali advan- 
tages (Cohen, 1968; Kelly, Beggs and McNeil , 1969; Walberg, 1971 ; Ker- 
linger and Pedhazur, 1973). A widely available program suitable for such 
anailyses is the Step-Wise Multiple Regression, BMD02R , from the Biomedical 
Computer Program Library (Dixon, 1974), which is available at most com- 
puter centers. 

There are also more complex approaches, forms of multivariate anal- 
ysis, which may be even more informative, but they require greater skill 
and more degrees of freedom to exploit their power. 

In an undertaking like program validation, in which most of the 
effont and money goes into collecting the data and reducing it to an ap- 
propriate form, the statistical analyses represent a small portion of the 
totail.. To settle for an analysis which does not fully exploit the data 
would be foolish economy. Complex phenomena need complex analysis if they 
are to be understood. 



Concl udi hg Comment s 

We have presented a simple paradigm which separates the educati onal 
process into identifiable points for assessment: teacher training, teach- 
er behavior or performance, pupil behavior, and pupil outcomes. The para- 
digm permits specifying the point at which assessment may take place and 
Identifies lines of influence between these points. It makes clear, for 
■example, why studies which simply examine relations between teacher train- 
ing and pupil outcomes are not likely to be productive because there are 
too many unidentified steps (and too many variables) in between. It in- 
dicates that the major difference between PBTE and past practice is that 
nt shifts evaluation from the training program itself to the behavior of 
the teacher who is graduated from that program. Teacher behavior becomes 
tne output of the training program and the input into the real -life class- 
ro.om. Program evaluation, then, examines the relation between training 



and teacher behavior and program validation examines the relation between 
teacher behavior and pupil behavior or pupil outcome. 

Problems in using pupil outcomes as criteria for evaluating teachers 
were presented and the conclusion was drawn that the problems are dis- 
abling. Measures of even year-long growth of pupils are so unreliable that 
data based on gains of about 20 classes per teacher would be required to 
evaluate teachers with the customary minimum standards of reliability. 

In the interest of practicality, teaching periods of a few hours or 
a few days are frequently used. But the problem of whether short-term 
gain relates to long-term gain, the probability that short-term gain will 
be measured at lower cognitive levels, and our lack of knowledge about 
whether teacher behaviors which promote learning at low cognitive levels 
also promote learning at high cognitive levels all raise question about 
these short-term approaches. There are the further difficulties that 
the objectives of education are many and complex, so that repeated measures 
for multiple outcomes would be required. Finally, and most importantly, 
to evaluate the teacher on the growth of the pupils is to base the fate 
of the teacher on ^he behavior of others oyer which he neiither ^has effec- 
tive control nor, ilt seems to ius, should ihave dS. These are scSne of the 
reasons why evaHua^ngi teachers by measunng the gadns of their ipupMs 
iis impractical andi probably inp^ 

Rather, the better iprocedure for evafluating: teachers wouHid be the 
measur.ement of teacher- behavior y is under his control to a greater 

degree, a^ithough even-this measurement i;S inei;ther simple nor ^easy/. Some 
of the problems of specii^ng and measuring, competencies may be-easedi, 
however,, iby using^ a hierarchiraT organisation in which specvi^Mt behav- 
i^rafl^ items and the need^ to tail k about Harge numbers of specWfic compe- 
tencies,, is bypassed for most purposes. But when this procedure is fol- 
1 owed , it is critical to assure empirically chat the items in each group 
do, in fact, belong together. 

While we recognize that program development wi 1 1 need to proceed on 
the basis of current theory and knowledge, even though current knowledge 
is weak, it is important to utilize the research knowledge which does 
exist. Not only does it show that some concepts are weak and others in 
error, but it suggests additional concepts and measures which go beyond 
ipresent theory and should be considered in the specification of compe- 
tencies. 

But important as program development and evaluation are, we will 
not in the long run know whether they have advanced education until we 
know whether they make a difference to pupils— until we validate our 
programs. We need to know empirically that the teacher behaviors which 
:programs teach do, in fact, produce the desired outcomes for the pupils 
taught. This is the step in the process which has most often been omitted. 
^But the weakness of theory and research on which programs are now based, 
coupled with the high cost of program development and the increasing 
concern by the public for accountability in education, leave no alterna- 
tive to moving ahead without deilay in this critical area. 
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The cost in time and money involved in the conduct of validation re- 
search will be reduced materially if it is done in the context of an on- 
going program rather than separately from such a program. There is some 
evidence that complex studies produce meaningful results when they bring 
together information about pupil, school, and community, along with mul- 
tiple low inference measures of teacher and pupil behavior, and multiple 
outcome measures, but only when the data are analyzed by procedures which 
can exploit their richness. While not simple or easy, such studies are 
feasible with the present "state of the art." This will be a difficult 
and expensive process, but minor compared to the difficulty and expense 
of program development and operation which we take for granted. 

While the assessment levels in the paradigm present fairly discrete 
points for assessment, taken as a whole, the paradigm provides a dynamic 
model in which training experiences or programs may be evaluated in terms 
of teacher behaviors and pupil behaviors and outcomes may be used -.o val- 
idate teacher oehaviors. The results of both processes can then be fed 
back into program modification. This becomes a continuous process of 
tpain, evaluate., validate, feedback, mod.ify,, tnain, etc. In- this sense, 
eva^luatnon and; va Hi dab: on represent a smafln investment whose potential: 
return appears to be great. This process, wi^hnSsempirican feedback 
■loops,, may ibe the =key to a new era. in education— -one in which we rea#y 
beqnn to know what a teacher can do to he! p^pu pills learn and iwhat a pro- 
gram must do to teach these skill's to teachers. 
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ABOUT AACTE 



The American Association of Colleges for Teacher Education is an 
organization of more than 860 colleges and universities joined together 
in a common interest; more effective ways of preparing educational 
personnel for our changing society. It is national in scope, institutional 
in structure, and voluntary. It has served teacher education for 55 years 
in professional tasks which no single institution, agency,, organization, 
or enterprise can accomplish alone. 

AACTE 's members are located in every state of the nation and in 
Puerto Rico, Guam, and the Virgin Islands. Collectively, they prepare 
more than 90 percent of the teaching force that enters American schools 
each yesr. 

The Association maintains its headquarters in the National Center for 
Higher Education, in Washington, D. C. — the nation's capital, which 
also in recent years has become an educational capital. This location 
enables AACTE to work closely wi;th.=many , prof essnonal organisations and 
government agencies concerned; with teachers and- their ipreparat^* on. 

InwAACTE headquarters, « stabTe iprofessionaT staff is in continuous 
interaction with: other educators and: wilt hoffticials who inffluence educa^ 
tron, both in iimed.i ate actions and; future thrusts. Educators have come 
to rely upon the AACTE headquarters of fiice for information, ideas, and: 
p-Jier assistance and", in turn,, to share their aspiratiisns and: needs. 
Such interaction a^lerts the staff andi officers to current and emerging 
needs^of socnety and-of educat^ion and -makes AACTsE the center for teacher- 
education. The iprafesslonal staff ds regularly out in- the ftield-^hatdon- 
atfly/ and internationaTily-^serving educators and keeping^ abreast Of the 
real world. 'J The headquarters off ice st^ff implements the Association's 
objectives and programs, keeping them vital and valid. 

^Through conferences, study committees, coirenissions, task forces, 
:publ*ications, and projects, VttCTE conducts a program relevant to the 
current naeds of those concerned; with better preparation programs for 
educational! personnel!. Major prograrfnst^c thrusts ar.2 carpi^d out by 
commissions on nnternationan: educatdony mulMcuTfturaT education, and 
accreditation standards. Other activ/ities include government relafcions 
and: a consultative serv/ice in teacher education. 

^ A number of activities are carried on coTlaborativelyv. These in- 
clude major ftiscal: support for and' selection- of higher education repre- 
sentatives on the Nat^'onail Councifl for Accreditation =Qf Teacher Education— 
an acti.Viity sanctioned by the Nationafl eommissdon on Accrediting and' a 
joint enterprise of higher education inst^itut^'ons represented' by AACTE, 
organizations of school board members, classroom teachers, state certifi- 
cation pff-icers, and chief state schooToffiicers. 
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The Association headquarters provides secretariat services for two 
organizations which help make teacher education more interdisciplinary 
and comprehensive: the Associated Organizations of Teacher Education 
and the Internationdl Council on Education for Teaching. A major interest 
in teacher education provides a common bond between AACTE and fraternal 
organizations. 

AACTE is deeply concerned with and involved in the major education 
issues of the day. Combining the considerable resources inherent in 
the consortium— constituted through a national voluntary association— 
with strengths of others creates a synergism of exceptional productivity 
and potentiality. Serving as the nerve center and spokesman for major 
efforts to improve education personnel, the Association brings to its 
task credibility, built-in cooperation and communications, contributions 
in cash and kind, and diverse staff and membership capabilities. 

AACTE provides a capability for energetically, imaginafively, and 
effectively moving the nation forward through better prepared educational 
personnel. From its administration of the pioneering educational tele- 
vision program, "Continental Classroom," to its involvement of 20,000 
practitioners, researchers, and decision makers in developing the current 
Reoormended Standards fop Teacher Education , to many other activities, 
AACTE has demonstrated its organizational and consortium qualifications 
andi experiences in conceptual izi ng, studying and experiment! ng, conmuni- 
cating, and implementing diverse thrusts for carrying out socially and 
educationally significant activities. With the past as prologue, AACTE 
is proud of its history and confident of its future among the "movers and 
doers" seeking continuous renewal of national aspirations and accomplish- 
ments through education. 
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ABOUT THE TEXAS TEACHER CENTER PROJECT 



ERIC 



The AACTE Conmittee on Performance-Based Teacher Education serves 
as the national component of the Texas Teacher Center Project. This 
Project was initiated in July, 1970, through a grant to the Texas Educa- 
tion Agency from the Bureau of Educational Personnel Development, USOE. 
The Project was initially funded under the Trainers of Teacher Trainers 
(TTT) Program and the national component was subcontracted by the Texas 
Education Agency to AACTE. 

One of the original thrusts of the Texas Teacher Center Project was 
to conceptualize and field test performance- based teacher education pro- 
grams in pilot situations and contribute to a statewide effort to move 
teacher certification to a performance base. By the inclusion of the 
national component in the Project, the Texas Project made it possible for 
all efforts in the nation related to performance-based teacher education 
to gain national visibility. More important, it gave to the nation a 
central forum- where continuous study and further clarification of the 
Eperfoniiance-ibased movement might take^^ 

Aftr-rr.'^^^'^'^^^® "^^^^^ Teacher Center Project is of par-tlcuTar interest to 
AACTE s Performance-Based Teacher Education Conmittee, the services of 
the Cpipi.ttea are avaiiTabl e , within, ifts resources , to a^il states , coineges 
and umvensi.ties, and: groups concerned -withi the iniprovement of preparation 
programs for schooT personnel}. 
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